Voice Source State as a Source of Information in Speech Recognition: Detection of Laryngealizations
نویسندگان
چکیده
Laryngealizations are irregular voiced portions of speech, which can have morphosyntactic functions and can disturb the automatic computation of F0. Two methods for the automatic detection of laryngealizations are described in this paper: With a Gaussian classifier using spectral and cepstral features a recognition rate of 80% (false alarm rate of 8%) could be achieved. As an alternative a “non-standard” method has been developed: an artificial neural network (ANN) was used for non-linear inverse filtering of speech signals. The inversely filtered signal was directly used as input for another ANN, which was trained to detect laryngealizations. In preliminary experiments we obtained a recognition rate of 65% (12% false alarms).
منابع مشابه
A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)
Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...
متن کاملVoice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملLaryngealizations and Emotions: How Many Babushkas?
It has been claimed that voice quality traits including irregular phonation such as creaky voice (laryngealization) serve several functions, amongst them being the marking of emotions; accordingly, they should be used for automatic recognition of these phenomena. However, laryngealizations marking emotional states have mostly been found for acted or synthesized data. First results using real-li...
متن کاملExtraction of Excitation Information from Speech and Its Applications for Expressive Speech Processing
Through speech production mechanism, speech with different voice qualities such as phonations, emotions, expressive singing and other paralinguistic sounds are also produced. Most of these sounds demonstrate these features mostly due to the excitation component (vibration of the vocal folds at the glottis) whereas the dynamic vocal tract system primarily conveys the message. Hence, the excitati...
متن کاملطراحی یک روش آموزش ناموازی جدید برای تبدیل گفتار با عملکردی بهتر از آموزش موازی
Introduction: The art of voice mimicking by computers, has with the computer have been one of the most challenging topics of speech processing in recent years. The system of voice conversion has two sides. In one side, the speaker is the source that his or her voice has been changed for mimicking the target speaker’s voice (which is on the other side). Two methods of p...
متن کامل